SportsStats Olympic Athletes Analysis
At a glance
- Project: Data cleaning and analysis
- Date: June 2025
- Category: Data analysis
Project overview
This project analyzes over 120 years of Olympic Games data for patterns in performance, participation, and country results. Using Python and SQL, it explores demographics, medals, and how physical attributes vary by sport.
The work highlights how gender participation has changed, which countries lead medal counts, and how height, weight, and age relate to sport—supporting a narrative on Olympic history and athletic excellence.
Dataset / source
Historical Olympic athlete data (e.g., athlete_events.csv with 271K+ rows) and noc_regions.csv for NOC-to-region mapping, spanning 1896–2016.
Tools used
Python, pandas, matplotlib, DuckDB (SQL), Jupyter-style notebook workflow.
What problem you solved
A large, messy athlete-level history needed to be summarized into reliable answers about medals, participation, gender trends, sport-level physical profiles, and country efficiency.
Key insights
- Medals
USA, Russia, and Germany lead total medal counts in the cleaned sample (e.g., USA 4,383; Russia 3,610; Germany 3,189). - Gender participation
Male vs female athlete counts over time show substantial growth in women’s participation, especially after the 1980s. - Physical profiles
Sport-level averages differ widely (e.g., basketball vs rhythmic gymnastics age profiles). - Efficiency
Medals-per-athlete metrics highlight efficient smaller programs (e.g., Jamaica at about 1.9 medals per athlete in the sample shown).
Explore More Projects
Discover other data analysis projects and interactive dashboards
View All Projects